Search CORE

144 research outputs found

Selected inversion as key to a stable Langevin evolution across the QCD phase boundary

Author: Bloch Jacques
Schenk Olaf
Publication venue: 'EDP Sciences'
Publication date: 27/07/2017
Field of study

We present new results of full QCD at nonzero chemical potential. In PRD 92, 094516 (2015) the complex Langevin method was shown to break down when the inverse coupling decreases and enters the transition region from the deconfined to the confined phase. We found that the stochastic technique used to estimate the drift term can be very unstable for indefinite matrices. This may be avoided by using the full inverse of the Dirac operator, which is, however, too costly for four-dimensional lattices. The major breakthrough in this work was achieved by realizing that the inverse elements necessary for the drift term can be computed efficiently using the selected inversion technique provided by the parallel sparse direct solver package PARDISO. In our new study we show that no breakdown of the complex Langevin method is encountered and that simulations can be performed across the phase boundary.Comment: 8 pages, 6 figures, Proceedings of the 35th International Symposium on Lattice Field Theory, Granada, Spai

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Local time stepping on high performance computing architectures: mitigating CFL bottlenecks for large-scale wave propagation

Author: Rietmann Max
Schenk Olaf
Publication venue
Publication date: 18/06/2015
Field of study

Modeling problems that require the simulation of hyperbolic PDEs (wave equations) on large heterogeneous domains have potentially many bottlenecks. We attack this problem through two techniques: the massively parallel capabilities of graphics processors (GPUs) and local time stepping (LTS) to mitigate any CFL bottlenecks on a multiscale mesh. Many modern supercomputing centers are installing GPUs due to their high performance, and extending existing seismic wave-propagation software to use GPUs is vitally important to give application scientists the highest possible performance. In addition to this architectural optimization, LTS schemes avoid performance losses in meshes with localized areas of refinement. Coupled with the GPU performance optimizations, the derivation and implementation of an Newmark LTS scheme enables next-generation performance for real-world applications. Included in this implementation is work addressing the load-balancing problem inherent to multi-level LTS schemes, enabling scalability to hundreds and thousands of CPUs and GPUs. These GPU, LTS, and scaling optimizations accelerate the performance of existing applications by a factor of 30 or more, and enable future modeling scenarios previously made unfeasible by the cost of standard explicit time-stepping schemes

RERO DOC Digital Library

Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization

Author: Hagemann Michael
Schenk Olaf
Wächter Andreas
Publication venue
Publication date: 18/06/2018
Field of study

Interior-point methods are among the most efficient approaches for solving large-scale nonlinear programming problems. At the core of these methods, highly ill-conditioned symmetric saddle-point problems have to be solved. We present combinatorial methods to preprocess these matrices in order to establish more favorable numerical properties for the subsequent factorization. Our approach is based on symmetric weighted matchings and is used in a sparse direct LDL T factorization method where the pivoting is restricted to static supernode data structures. In addition, we will dynamically expand the supernode data structure in cases where additional fill-in helps to select better numerical pivot elements. This technique can be seen as an alternative to the more traditional threshold pivoting techniques. We demonstrate the competitiveness of this approach within an interior-point method on a large set of test problems from the CUTE and COPS sets, as well as large optimal control problems based on partial differential equations. The largest nonlinear optimization problem solved has more than 12 million variables and 6 million constraint

RERO DOC Digital Library

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Author: Burkhart Helmar
Christen Matthias
Schenk Olaf
Publication venue
Publication date: 18/06/2018
Field of study

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for "Parallel Autotuned Stencils,” generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and optimization to be applied, and leverages the autotuning methodology to optimize strategy-specific parameters for the given hardware architectur

RERO DOC Digital Library